-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix get dummies unicode error #22131
BUG: Fix get dummies unicode error #22131
Conversation
pandas/tests/reshape/test_reshape.py
Outdated
df = pd.DataFrame({'x': [u'ä']}) | ||
result = pd.get_dummies(df) | ||
expected = pd.DataFrame({u'x_ä': [1]}, dtype=np.uint8) | ||
assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one would pass even without a fix, but I've included it for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More coverage is (almost) never a problem for us 🙂
Codecov Report
@@ Coverage Diff @@
## master #22131 +/- ##
==========================================
- Coverage 92.06% 92.06% -0.01%
==========================================
Files 169 169
Lines 50689 50693 +4
==========================================
+ Hits 46667 46670 +3
- Misses 4022 4023 +1
Continue to review full report at Codecov.
|
b6995f9
to
07975d6
Compare
pandas/tests/reshape/test_reshape.py
Outdated
expected = pd.DataFrame({u'x_ä': [1]}, dtype=np.uint8) | ||
assert_frame_equal(result, expected) | ||
|
||
df = pd.DataFrame({'x': ['a']}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parametrize this test?
pandas/core/reshape/reshape.py
Outdated
else '{prefix}{sep}{level}' for v in levels] | ||
dummy_cols = [dummy_str.format(prefix=prefix, sep=prefix_sep, level=v) | ||
for dummy_str, v in zip(dummy_strs, levels)] | ||
py2_prefix_is_unicode = isinstance(prefix, text_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a little helper function here so that you can do this as a list-comprehension
pandas/core/reshape/reshape.py
Outdated
@@ -923,11 +923,17 @@ def get_empty_Frame(data, sparse): | |||
|
|||
number_of_cols = len(levels) | |||
|
|||
py2_prefix_sep_is_unicode = isinstance(prefix_sep, text_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this explicit by also using PY2
pandas/tests/reshape/test_reshape.py
Outdated
@@ -302,6 +302,26 @@ def test_dataframe_dummies_with_categorical(self, df, sparse, dtype): | |||
expected.sort_index(axis=1) | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_dataframe_dummies_unicode(self): | |||
df = pd.DataFrame(({u'ä': ['a']})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference issue number as a comment above this line.
50baa9a
to
15f2946
Compare
thanks @Scorpil ! |
* master: (47 commits) Run tests in conda build [ci skip] (pandas-dev#22190) TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165) CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184) Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179) DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175) DOC: updated Series.str.contains see also section (pandas-dev#22176) 0.23.4 whatsnew (pandas-dev#22177) fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973) BUG: Fix get dummies unicode error (pandas-dev#22131) Fixed py36-only syntax [ci skip] (pandas-dev#22167) DEPR: pd.read_table (pandas-dev#21954) DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119) BUG: Matplotlib scatter datetime (pandas-dev#22039) CLN: Use public method to capture UTC offsets (pandas-dev#22164) implement tslibs/src to make tslibs self-contained (pandas-dev#22152) Fix categorical from codes nan 21767 (pandas-dev#21775) BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125) use memoryviews instead of ndarrays (pandas-dev#22147) Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155) API: Default to_* methods to compression='infer' (pandas-dev#22011) ...
git diff upstream/master -u -- "*.py" | flake8 --diff